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Abstract — We describe message-passing and decimation ap- 
proaches for lossy source coding using low-density generator 
matrix (LDGM) codes. In particular, this paper addresses the 
problem of encoding a Bernoulli^) source: for randomly gener- 
ated LDGM codes with suitably irregular degree distributions, 
our methods yield performance very close to the rate distortion 
limit over a range of rates. Our approach is inspired by the survey 
propagation (SP) algorithm, originally developed by Mezard et 
al. [1] for solving random satisfiability problems. Previous work 
by Maneva et al. [2] shows how SP can be understood as belief 
propagation (BP) for an alternative representation of satisfiability 
problems. In analogy to this connection, our approach is to define 
a family of Markov random fields over generalized codewords, 
from which local message-passing rules can be derived in the 
standard way. The overall source encoding method is based on 
message-passing, setting a subset of bits to their preferred values 
(decimation), and reducing the code. 



I. Introduction 

Graphical codes such as turbo and low-density parity check 
(LDPC) codes, when decoded with the belief propagation 
or sum-product algorithm, perform close to capacity [e.g., 
3]. Similarly, LDPC codes have been successfully used for 
various types of lossless compression schemes [e.g., 4]. One 
standard approach to lossy source coding is based on trellis 
codes and the Viterbi algorithm. The goal of this work is to 
explore the use of codes based on graphs with cycles, whose 
potential has not yet been fully realized for lossy compression. 
A major challenge in applying such graphical codes to lossy 
compression is the lack of practical (i.e., computationally 
efficient) algorithms for encoding and decoding. Accordingly, 
our focus is the development of practical algorithms for 
performing lossy source compression. For concreteness, we 
focus on the problem of quantizing a Bernoulli source with 
p = \. Developing and analyzing effective algorithms for 
this problem is a natural first step towards solving more gen- 
eral lossy compression problems (e.g., involving continuous 
sources or memory). 

Our approach to lossy source coding is based on the dual 
codes of LDPC codes, known as low-density generator matrix 
(LDGM) codes. This choice was partly motivated by an 
earlier paper of Martinian and Yedidia [5], who considered 
a source coding "dual" of the BEC channel coding prob- 
lem. They proved that optimal rate-distortion performance 
for this problem can be achieved using the LDGM duals 



of capacity-achieving LDPC codes, and a modified message- 
passing algorithm. Our work was also inspired by the original 
survey propagation algorithm [1], and subsequent analysis 
by Maneva et al. [2] making a precise connection to belief 
propagation over an extended Markov random field. In recent 
work, Murayama [6] developed a modified form of belief 
propagation based on the TAP approximation, and provided 
results in application to source encoding for LDGM codes 
with fixed check degree two. In work performed in parallel to 
the work described here, other research groups have applied 
forms of survey propagation for source encoding based on 
codes composed of local non-linear "check" functions [7] and 
fc-SAT problems with doping [8]. 

II. Background and set-up 

Given a Ber(i) source, any particular i.i.d. realization y £ 
{0, 1}™ is referred to as a source sequence. The goal is to com- 
press source sequences y by mapping them to shorter binary 
vectors x £ {0, l} m with m < n, where the quantity R := — 
is the compression ratio. The source decoder then maps the 
compressed sequence x to a reconstructed source sequence y. 
For a given pair (y, y), the reconstruction fidelity is measured 
by the Hamming distortion d H {y, y) ■= £ J2i=i \Vi The 
overall quality of our encoder-decoder pair is measured by 
the average Hamming distortion D := E[cIh{Y,Y)]. For the 
Bcr(i) source, the rate distortion function is well-known to 
take the form R(D) = 1 - H(D) for D £ [0,0.5], and 
otherwise. 

Our approach to lossy source coding is based on low-density 
generator matrix codes, hereafter referred to as LDGM codes, 
which arise naturally as the duals of LDPC codes. For a 
given rate R = — < 1, let A be an n X m matrix with 
{0, 1} entries, where we assume rank A = m without loss of 
generality. The low-density condition requires that the number 
of Is in each row and column is bounded. The matrix A 
is the generator matrix of the LDGM, thereby defining the 
code C(A) := {z £ {0, 1}" | z = Ax for some x £ {0, l} m }, 
where arithmetic is performed over GF(2). It will also be 
useful to consider the code over (x, z) given by C(A) := 
{{x,z) £ {0,1}™+™ | z = Ax}. We refer to elements of 
x as information bits, and elements of z as source bits. In 
the LDGM approach to source coding, the encoding phase 
of the source coding problem amounts to mapping a given 



source sequence y <= {0, 1}" to an information vector x(y) G 
{0, l} m . Decoding is straightforward: we simply form y(x) 
.1,7' The challenge lies in the encoding phase: in particular, 
we must determine the information bit vector x such that 
the Hamming distortion —\\y — Ax\\i is minimized. This 
combinatorial optimization problem is equivalent to an MAX- 
XORSAT problem, and hence known to be NP-hard in general. 

It is convenient to represent a given LDGM code, specified 
by generator matrix A, as a factor graph G — (V,C,E), 
where V = {1, . . . , m) denotes the set of information bits and 
C := {1, . . .,n} denotes the set of checks (or equivalently, 
source bits), and E denotes the set of edges between checks 
and information bits. As illustrated in Figure [l] the n source 
bits are lined up at the top of the graph, and each is connected 
to a unique check neighbor. Each check, in turn, is connected 
to (some subset of) the m information bits at the bottom 
of the graph. Note that there is a one-to-one correspondence 
between source bits and checks. We use letters a, b, c to refer 
to elements of C, corresponding either to a source bit or the 
associated check. Conversely, we use letters i,j,k to refer 
to information bits in the set V. For each information bit 
i € V, let C(i) C C denote its check neighbors: C(i) := 
{a G C (a,i) G E}. Similarly, for each check a G C, we 
define the set V(a) := {i G V (a, i) G E}. We use the 
notation V(a) := V(a) U {a} to denote the set of all bits — 
both information and source — that are adjacent to check a. 

III. Markov random fields and decimation with 

GENERALIZED CODEWORDS 

A natural first idea to solving the source encoding problem 
would be to follow the channel coding approach: run the 
sum-product algorithm on the ordinary factor graph, and then 
threshold the resulting log-likelihood ratios (LLRs) at each bit 
to determine a source encoding x(y). Unfortunately, this ap- 
proach fails: either the algorithm fails to converge or the LLRs 
fail to yield reliable information, resulting in a poor source 
encoding. Inspired by survey propagation for satisfiability 
problems [1], we consider an approach with two components: 
(a) extending the factor distribution so as to include not 
just ordinary codewords but also a set of partially assigned 
codewords, and (b) performing a sequence of message-passing 
and decimation steps, each of which entails setting fraction of 
bits to their preferred values. 

More specifically, we consider Markov random fields over 
a larger space of so-called generalized codewords, which are 
members of the space {0, 1, *}™+ m where * is a new symbol. 
As we will see, the interpretation of = * is that the 
associated bit i is free. Conversely, any bit for which Xi G 
{0, 1} is forced. One possible view of a generalized codeword, 
as with the survey propagation and fc-SAT problems, is as an 
index for a cluster of ordinary codewords. We define a family 
of Markov random fields, parameterized by a weight for He- 
variables, and a weight that measures fidelity to the source 
sequence. As a particular case, our family of MRFs includes 
a weighted distribution over the set of ordinary codewords. 
Although the specific extension considered here is natural to us 



(and yields good source coding results), it could be worthwhile 
to consider alternative ways in which to extend the original 
distribution to generalized codewords. 

A. Generalized codewords 

Definition 1 (Check states): In any generalized codeword, 
each check is in one of two possible exclusive states: 

(i) we say that check a S C is forcing whenever none of 
its bit neighbors are free, and the local {0, l}-codeword 
{z a ]Xv(a)) G {0, l} 1+ l v '( a )l satisfies parity check a. 

(ii) on the other hand, check a is free whenever z a = *, and 
moreover Xi = * for at least one i G V(a). 

Note that the source bit z a is free (or forced) if and only if 
the associated check a is free (or forcing). With this set-up, 
our space of generalized codewords is defined as follows: 

Definition 2 (Generalized codeword): A vector (z,x) G 
{0, 1, *}"+"* is a valid generalized codeword when the fol- 
lowing conditions hold: 

(i) all checks a are either forcing or free. 

(ii) if some information bit Xi is forced (i.e., Xi G {0,1}), 
then at at least two check neighbors a G C(i) must be 
forcing it. 

For a generator matrix in which every information bit has de- 
gree two or greater, it can be seen that any ordinary codeword 
(z,x) G C(A) is also a generalized codeword. In addition, 
there are generalized codewords that include *'s in some 
positions, and hence do not correspond to ordinary codewords. 
One such (non-trivial) generalized codeword is illustrated in 
Figure [2 A natural way in which to generate generalized 
codewords is via an iterative "peeling" or "leaf-stripping" 
procedure. Related procedures have been analyzed in the 
context of satisfiability problems [2], XORSAT problems [9], 
and for performing binary erasure quantization [5]. 
Peeling procedure: Given some initial source sequence z G 
{0, 1, *}", initialize all information bits Xi to be forced. 

1) While there exists a forced information bit n with 
exactly one forcing check neighbor a, set n — z a = *. 

2) When all remaining forced information bits have at least 
two forcing checks, go to Step 3. 

3) For any free check z a — * with no free information bit 
neighbors, set z a = ® i£ v( a ) x i- 

When initialized with at least one free check, Step 1 of this 
peeling procedure can terminate in one of two possible ways: 
either the initial configuration is stripped down to the all-* 
configuration, or Step 1 terminates at a configuration such that 
every forced information bit has two or more forcing check 
neighbors, thus ensuring that condition (ii) of Definition |2] is 
satisfied. As noted previously [5], these cores can be viewed 
as "duals" to stopping sets in the dual LDPC. Finally, Step 3 
ensures that every free check has at least one free information 
bit, thereby satisfying condition (i) of Definition |2] 

B. Weighted version 

Given a particular source sequence y G {0,l} n , 
we form a probability distribution over the set 
of generalized codewords as follows. For any 




Fig. 1. Illustration of a generalized codeword for a small 
LDGM. Information bits i and j are both forced; for each, 
the two forcing checks are a and b. The remaining checks 
and bits are all free. 

generalized codeword (z,x) £ {0, l,*} n+m , we define 
the sets n s ° u (z) :— \{i £ {1, . . . , n} | Zi = *}| and 
n™ io (x) :— \{i £ {1, • ■ ■ , m} \ X{ — *}|, corresponding 
to the number of ^-variables in the source and information 
bits respectively. We associate non-negative weights w sou and 
Winfo with the ^-variables in the source and information bits 
respectively. Finally, we introduce a non-negative parameter 
7, which will be used to penalize disagreements between 
the source bits z and the given (fixed) source sequence y. 
Of interest to us in the sequel is the weighted probability 
distribution 

p(z, x; w sou , w info , A) oc Wsou {z) x w"* fo (x) x exp _27dffto ' z) . 

(1) 

Note that for w sou = Wi n f = 0, this distribution reduces to 
the standard weighted distribution over ordinary codewords. 

C. Representation as Markov random field 

We now seek to represent the set of generalized codewords 
as a Markov random field (MRF). A first important observation 
is that state augmentation is necessary to achieve such a 
Markov representation with respect to the original factor 
graph. 

Lemma 1: For positive w sou , Wi n ( , the set of generalized 
codewords cannot be represented as a Markov random field 
based on the original factor graph G where the state space at 
each bit is simply {0, 1, *}. 

Proof: It suffices to demonstrate that it is impossible 
to construct an indicator function for membership in the set 
of generalized codewords as a product of local compatibility 
functions on {0, 1, *}, one for each check. The key is that the 
set of all local generalized codewords cannot be defined only 
in terms of the variables Xyt a y, rather, the validity depends 
also on all bit neighbors of checks that are incident to bits in 
V(a). or more formally on bits with indices in the set 

U i6 y(„){j £V\j£ V(b) for some b £ C(i)}. (2) 

As a particular illustration, consider the trivial LDGM code 
consisting of a single source bit (and check) connected to 
three information bits. From Definition Q an d Definition |2] 
it can be seen that the only generalized codeword is the 
all-* configuration. Thus, any check function used to define 
membership in the set of generalized codewords would have 
to assign zero mass to any other {0, 1, *} configuration. Now 
suppose that this simple LDGM is embedded within a larger 



LDGM code. For instance, consider the check labeled e (with 
source bit z e ) and corresponding information bits {j, k, 1} in 
Figure [2 With respect to the generalized codeword in this 
figure, we see that the local configuration (xj,Xk,xiz e ) = 
(1, *,*,*) is locally valid, which contradicts our conclusion 
from considering the trivial LDGM code in isolation. Hence, 
the constraints enforced by a given check change depending 
on the larger context in which it is embedded. ■ 

Consequently, obtaining a factorization of the distribution 
requires keeping track of variables in the extended set (|2j- 
Accordingly, as in the reformulation of survey propagation 
for SAT problems by Maneva et al. [2], we introduce a new 
variable Pj, so that there is a vector (xj,Pj) associated with 
each bit. To define P i; first let V(i) = V{C(i)) denote the 
power set of all of the clause neighbors C(i) of bit i. (I.e., V(i) 
is a set with 2' C WI elements). The variable Pj takes on subsets 
of C(i), and we decompose it as Pi = PP UP/, where at any 
time at most one of P/ and Pf are non-empty. The variable 
Pi has the following decomposition and interpretation: (a) if 
PP = P/ = 0, then no checks are forcing bit xf, (b) if P, = 
P/ ^ 0, then certain checks are forcing ir, to be one (so that 
necessarily Xi — 1); and (c) similarly, if Pi — P" / 0, then 
certain checks are forcing Xi to be zero (so that necessarily 
Xi = 0). By construction, this definition excludes the case that 
both P 4 ° and P? non-empty at the same time, so that the state 
space ofP has cardinality 2l c WI+2l c «l-l = 2l c «l +1 -l. 

D. Compatibility functions 

We now specify a set of compatibility functions to capture 
the Markov random field over generalized codewords. 

1 ) Variable compatibilities: For each bit index i (or a), let 
and A" denote the weights assigned to the events Xi = 1 

and Xi — respectively. For source encoding, these weights 
are specified as \\ = A? = 1 for all information bits i (i.e., no 
a priori bias on the information bits), so that the compatibility 
function takes the form: 

{1 if Xi = 1 and \Pi\ = |P/| > 2 

1 if x i = 0and \Pi\ = |PP| >2 (5) 
Winfo if Xi = * and p = 

The source bits have compatibility functions of the form 
Mz a ,P a ) = A° if z a = and P\ = {a}; ^ a {z a ,P a ) = A* 
if z a = 1 and P 1 = {a}; and ip a (z a ,Pa) = w sou if z a = * 
and P a = 0. Here A„ := y a exp(7) + (1 - y Q )exp(-7), 
A° := 1/A^, and the parameter 7 > reflects how strongly 
the source observations are weighted. 

2) Check compatibilities: For a given check a, the associ- 
ated compatibility function 4> a {xv{a)i z a, Pv(a)) i s constructed 
to ensure that the following two properties hold: (1) The 
configuration {z a } Uxv( a ) is valid for check a, meaning that 
(a) either it includes no *'s, in which case the pure {0,1} 
configuration must be a local codeword; or (b) the associated 
source bit is free (i.e., z a = *), and Xi — * for at least one 
i £ V(a). (2) For each index i £ V(a), the following condition 
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Fig. 2. Message-passing updates involve five types of messages from bit to check, and five types of messages from check to bit. 
Any source bit z a always sends to its only check a the message 5-vector (ip a (0), ip a (l), 0, 0, w sou ). The message vector in any 
given direction on any edge is normalized to sum to one. 



holds: (a) either a 6 Pj and a forces Xi, or (b) there holds 
a £ Pi and a does not force Xi. 

Pemma 1: With the singleton and factor compatibilities as 
above, consider the distribution p wez ((x, P(x)), (z, P(z))), 
defined as a Markov random field (MRF) over the factor graph 
in the following way: 

Y[lpi{Xi,Pi) Y\^a{Za,Pa)<Pa{x v{a) ,Z a ,P v(a) ). (6) 

Its marginal distribution over (x, z) agrees with the weighted 
distribution 0. 

E. Message-passing updates 

In our extended Markov random field, the random variable 
at each bit node i is of the form (xi, Pi), and belongs to the 
Cartesian product {0,1,*} x [V(i) x {0,1}]. (To clarify the 
additional {0,1}, the variable Pi — Pf U P/ corresponds to a 
particular subset of V(i), but we also need to specify whether 
Pi = Pj° or Pi = Pj 1 .) Although the cardinality of V(i) can is 
exponential in the bit degree, it turns out that message-passing 
can be implemented by keeping track of only five numbers for 
each message (in either direction). These five cases are the 
following: 

(i) (xi =0,o£ Pi): check a is forcing Xi to be equal to 
zero. We say Xi is a forced zero with respect to a, and 
use M°_£ a and for the corresponding bit-to-check 

and check-to-bit messages. 



(ii) (xi — 1, a £ Pi)'~ check a is forcing Xi to be equal to 

one. We say that Xi is a forced one with respect to a, and 

if if 

denote the corresponding messages and M i^. 

(iii) (xi = 0, ^ Pf C C(i)\{a}): A check subset not 
including a is forcing Xi — 0. We say x\ is a weak zero 
with respect to check a, and denote the messages MPjJ a 

and M 

(iv) (n = 1,0 ^ P/ C C(i)\{a}): A check subset not 
including a forces Xi = 1. We say that that Xi is a 
weaA: one with respect to check a, and use corresponding 
messages M/™ and M^. 

(v) (xj = *,P/ = PP = 0): No checks force bit xf, 
associated messages are denoted by Mf_ >a and M*_ >a . 

The differences between these cases is illustrated in Figure [l] 
The information bit Xi = is a forced zero with respect to 
checks a and b (case (i)), and a weak zero with respect to 
checks d and / (case (iii)). Similarly, the setting Xj = 1 is a 
forced one for checks a and b, and a weak one for checks c and 
e. Finally, there are a number of * variables to illustrate case 
(v). With these definitions, it is straightforward (but requiring 
some calculation) to derive the BP message-passing updates as 
applied to the generalized MRF, as shown in Figure [2] It can 
be seen that this family of algorithms includes ordinary BP 
as a special case: in particular, if w sou = lUinfo = 0, then the 
updates reduce to the usual BP updates on a weighted MRF 
over ordinary codewords. 



F. Decimation based on pseudomarginals 

When the message updates have converged, the sum-product 
pseudomarginals (i.e., approximations to the true marginal 
distributions) are calculated as follows: 

W (o) « A»{n [c+^i- n m ^ 

aeC(i) aGC(i) 

- e <ii n 

b€C(i) aeC(i)\{6} 

fii(*) OC W inio JJ M*_^. 

aeC(i) 

with a similar expression for The overall triplet is 

normalized to sum to one. As with survey propagation and 
SAT problems [1], [2], the practical use of these message- 
passing updates for source encoding entail: (1) Running the 
message-passing algorithm until convergence; (2) Setting a 
fraction of information bits, and simplifying the resulting 
code; and (3) Running the message-passing algorithm on the 
simplified code, and repeating. We choose information bits to 
set based on bias magnitude Bi := \/J,i(l) — Hi(0)\. 

IV. Results 

We have applied a C-based implementation of our algorithm 
to LDGM codes with various degree distributions and source 
sequences of length n ranging from 200 to 100, 000. Although 
message-passing can be slow to build up appreciable biases for 
regular degree distributions, we find that biases accumulate 
quite rapidly for suitably irregular degree distributions. We 
chose codes randomly from irregular distributions optimized 1 
for ordinary message-passing on the BEC or BSC using 
density-evolution [3]. Figure [3] compares experimental results 
to the rate-distortion bound R(D). We applied message- 
passing using a damping parameter a = 0.50, and with 
w sou = 1.10, Winfo = 1.0 and 7 varying from 1.45 (for rate 
0.90) to 0.70 (for rate 0.30). Each round of decimation entailed 
setting all information bits with biases above a given threshold, 
up to a maximum of 2% of the total number of bits. As seen 
in Figure [3J the performance is already very good even for 
intermediate block length n = 10, 000, and it improves for 
larger block lengths. After having refined our decimation pro- 
cedure, we have also managed to obtain good source encodings 
(though currently not quite as good as Figure[5} using ordinary 
BP message-passing (i.e., w sou = w- m { = 0) and decimation; 
however, in experiments to date, in which we do not adjust 
parameters adaptively during decimation, we have found it 
difficult to obtain consistent convergence of ordinary BP (and 
more generally, message-passing with w S0U ,Wi n { w 0) over 
all decimation rounds. It remains to perform a systematic 
comparison of the performance of message-passing/decimation 
procedures over a range of parameters (u> sou , w- m f , 7) for a 
meaningful quantitative comparison. 

'This choice, though not optimized for source encoding, is reasonable in 
light of the connection between LDPC channel coding and LDGM source 
coding in the erasure case [5]. 



Rate-distortion performance 




Fig. 3. Plot of rate versus distortion, comparing the Shannon 
limit (solid line) and empirical performance using LDGM 
codes with blocklength n — 10, 000. Each diamond is the 
average distortion over 15 trials. 



There remain various open questions suggested by this 
work. For instance, an important direction is developing meth- 
ods for optimizing LDGM codes, and the choice of parameters 
in our extended MRFs for source encoding. An important 
practical issue is to investigate the tradeoff between the con- 
servativeness of the decimation procedure (i.e., computation 
time) versus quality of source encoding. Finally, the limiting 
rate-distortion performance of LDGM codes is a theoretical 
question that (to the best of our knowledge) remains open. 
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